Improving Inversions of the Overlap Operator 

S. Krieg*"*, N. Cundy'* \ J. van den Eshof*", A. Frommer'^, Th. Lipperf^, and K. Schafer''. 
'^Department of Physics, Universitat Wuppertal, Gaussstrasse 19, Wuppertal, Germany. 
''Department of Mathematics , University of Diisseldorf, Germany. 
■^Department of Mathematics, Universitat WuppertaL 

''John von Neumann Institute for Computing, Jiihch Research Centre, 52425 Jiihch, Germany. 

We present relaxation and preconditioning techniques which accelerate the inversion of the overlap operator by 
a factor of four on small lattices, with larger gains as the lattice size increases. These improvements can be used 
in both propagator calculations and dynamical simulations. 



1. INTRODUCTION 

The massive overlap operator can be written as 

pi + 75sign((9), 

where p > 1 is a mass parameter corresponding 
to a bare fermion mass to^ = m{p — 1), and Q 
is the hermitian Wilson operator with a negative 
mass parameter —to. Note that 75 sign((5) is uni- 
tary. We can also define a hermitian overlap op- 
erator Dh — j^Du- Both operators are normal, 
i.e. they commute with their adjoints. We will 
approximate the matrix sign function using the 
Zolotarev partial fraction expansion (ZPFE), 



sign(Q) 



Nz 



where the coefficients tUi and ti are known PP, 
and the accuracy of the approximation depends 
on the Zolotarev order Nz- Multiplication of the 
ZPFE with a vector is easily implemented using 
a multishift CG-inversion |2]. 

In these proceedings, we will sketch how to op- 
timise the inversion of D„ (the propagator calcu- 
lation), and (the full inversion). Both these 
inversions can be reduced to solving for the vector 
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X in any of the three following equations: 

DuX = 5, (1) 
DhX^ 75^ (2) 
DhDhX = b. (3) 

2. SUMR 

The optimal method to solve equation Q is 
well known to be the minimal residual (MIN- 
RES). Similarly, for equation ||2Jl the conjugate 
gradient (CG) method is optimal. Less well 
known is the shifted unitary minimal residual 
(SUMR) method |8l4j . which can be used to in- 
vert equation There are known formulae for 
the accuracy of the inversion for each of these 
methods after k iterations, which can be used to 
calculate a worst case bound for the number of 
iterations fc(e,p) needed to achieve an accuracy e 
a:* is the true solution, x'^ the approximate 
inverse, and r'' = D^x^ — 6 is the residual. 

(i) For SUMR we have 



x^~x* 



< 



< 



k{e,p) < 



-He) , -ln(2/(p-l)) 



ln(p) \n{p) 
(ii) For MINRES we use ||r0||2 = ||rj;||2, since 



1 



2 




multiplications with Q 



Figure 1 . The residual of a SUMR inversion com- 
pared to CG and MINRES on an dynamical 8"* 
configuration, plotted against calls to the Wilson 
operator. 



to the product of the overlap operator with a 
vector . Here rj^ is the relative accuracy of the 
sign function: 

\\D^y^ -sm<^,-\\Dj-\\y^\\. 

The precision t] can be fixed at a value less than 
e, but it is more efficient to increase rj as the in- 
version progresses. The key is to ensure that the 
residual gap 

\\b^_Axl^\\ < \\r'' - {b- Ax'')\\ 

true residual residual gap 

+ ll^ll 

computed residual 

at the end of the calculation is of order e. The 
optimal relaxation strategies are 



rl = 75?-°, 



x'^-x* 1|2< 



p-1 



llJ 



k{e,p) <2 



P- 1 VP 
-ln(£) , -ln(2/(p-l)) 



II rl II2, 



Hp) 



Hp) 



(iii) For CG (eqn. Q, with two calls to Dh per 
iteration) we have 



X"~X* 2< 



k{s,p) < 



pfe(p-l) 
-ln(£) , -ln(2/(p-l)) 



Hp) 



Hp) 



Based on these worst case estimates, SUMR is a 
factor 2 faster than CG or MINRES for a prop- 
agator calculation. We see this factor of 2 im- 
provement numerically (see figure^. 

3. RELAXATION 

While solving D^x ~ b, there is no need to 
calculate the sign function to the full accuracy (= 
e) during the entire inversion. For each step of the 
inversion we have to calculate an approximation 



method 


tolerance rjj 


CG 




MINRES 


Vj = e/lk^JI 


SUMR 


Vj = e/lk^ll 



Yy Ik" II2, 



The accuracy of the sign function can be relaxed 
by reducing the accuracy of the multishift solver 
used to solve the ZPFE, and by reducing Nz- 
This relaxation can be applied to CG (we call 
the relaxed CG inversion "relCG") and SUMR 
("relSUMR"). Relaxation gains a factor of 1.5-1.8 
in computer time (see section ^J. 

4. JVE PRECONDITIONING 

We can achieve further gains by using inversion 
of a low-accuracy overlap operator as a precon- 
ditioner; we use relSUMR (as preconditioner for 
the propagator inversion) or relCG (for the full 
inversion) with the relative accuracy e = 0.01, 
and Nz = 5 (these numbers can be optimized). 
This preconditioner can be used in an inversion 
algorithm which can support a variable precon- 
ditioner: we used the GMRESR algorithm |5l6j . 
The GMRESR inversion can be relaxed, using 
the methods of the previous section, so we have 
chosen to call this inversion algorithm relGM- 
RESR(SUMR) for the propagator inversion or 
relGMRESR(CG) for the full inversion. This 
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relGMRESR(SUMR) 



nultiplications with Q 




10 12 14 16 18 
multiplications with Q x 1 o" 



Method 


p= 1.86 


p= 1.22 


CG 


1258 


2019 


relCG 


729(1.72) 


1229(1.64) 


relGMRESR(CG) 


312(4.04) 


576(3.51) 


SUMR 


1313 


2489 


relSUMR 


782(1.68) 


1619(1.54) 


relGMRESR(SUMR) 


372(3.52) 


633(3.93) 



Table 1 

Timings for one inversion of ||2J) , or two inversions 
of ^ averaged over three dynamical /3 — 5.6 8'^ 
configurations. The number in brackets is the 
gain from the unrelaxed preconditioned case. 



Method 


p= 1.22 


p = 1.06 


CG 


9022 


31430 


relCG 


5981(1.51) 


18813(1.67) 


relGMRESR(CG) 


2329(3.87) 


6642(4.73) 


SUMR 


8312 


31550 


relSUMR 


6038(1.38) 


18840(1.87) 


relGMRESR(SUMR) 


2252(3.69) 


5974(5.82) 



Table 2 

Same as tabled but on a quenched /3 — 6.0 16^ 
configuration. 



Figure 2. The norm of the residual compared 
to calls to the Wilson operator when solving ^ 
using SUMR, relSUMR and GMRESR(SUMR) 
(top), or when solving (eqn. (jSJ) using CG, 
relCG and relGMRESR(CG). The calculations 
were done on an 8^ configuation with p = 1.86 



preconditioning achieves a gain of at least a fac- 
tor of 4 in computer time (see figure El and tables 
ClandEl). 



5. CONCLUSIONS 

We present algorithms which accelarate the in- 
vesion of the overlap operator by a factor of about 
four for the full inversion (eqn. Q). Using rel- 
GMRESR(SUMR) can give a gain of a factor of at 
least 10 over MINRES for the propagator calcu- 
lation. We expect that the gain will be increased 



on larger systems and at lower masses. 
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